Waveform Level Synthesis

نویسنده

  • Qingyun Dou
چکیده

This thesis investigates waveform-level synthesis models, which directly generate audio waveforms. In contrast, traditional feature-level synthesis models generate vocoder feature sequences, which are then converted to waveforms. This type of synthesis is limited by several factors, including the quality of the vocoder, the fixed-length analysis window and the lack of expressiveness. Waveform-level synthesis models is not limited by these factors, as vocoder is not used to generate waveforms. In this thesis, both unconditional synthesis and conditional synthesis are investigated. For unconditional waveform-level synthesis, the major challenge is to model a long history. Two models are investigated: Hierarchical Recurrent Neural Network (HRNN) and Dilated Convolutional Neural Network (DCNN). HRNN models a long history with a stack of RNNs, each operating at a different time scale, while DCNN uses stack of CNNs, each dilated to a different extent. Experiments are performed with HRNN, using both music and speech data. It is found that the model performs well in both cases, and that the structure of the network should be designed according to the time scales of different tiers. For conditional waveform-level synthesis, the major challenge is to incorporate extra information into the unconditional models. The analysis focuses on adding text information, which is more complicated and more general than other information such as music style. Two approaches are investigated: using standard text labels and using text labels generated by a neural network with attention mechanism. A new conditional synthesis model is developed, combining HRNN and standard standard text labels. Experiments are performed with both the new model and a feature-level synthesis model. The results are analyzed in both timedomain and feature-domain. It is found that the waveform-level synthesis model achieves performance comparable to the feature-level synthesis model, even with very limited tuning, and that having multiple tiers is essential to good performance.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Waveform Design using Second Order Cone Programming in Radar Systems

Transmit waveform design is one of the most important problems in active sensing and communication systems. This problem, due to the complexity and non-convexity, has been always the main topic of many papers for the decades. However, still an optimal solution which guarantees a global minimum for this multi-variable optimization problem is not found. In this paper, we propose an attracting met...

متن کامل

Formant-based synthesis of singing

Rule-driven formant synthesis is a legacy technique that still has certain advantages over currently prevailing methods. The memory footprint is small and the flexibility is high. Using a modular, interactive synthesis engine, it is easy to test the perceptual effect of different source waveform and formant filter configurations. The rule system allows the investigation of how different styles ...

متن کامل

Statistical parametric speech synthesis with joint estimation of acoustic and excitation model parameters

This paper describes a novel framework for statistical parametric speech synthesis in which statistical modeling of the speech waveform is performed through the joint estimation of acoustic and excitation model parameters. The proposed method combines extraction of spectral parameters, considered as hidden variables, and excitation signal modeling in a fashion similar to factor analyzed traject...

متن کامل

Extended Waveform Segment Synthesis, a Nonstandard Synthesis Model for Microsound Composition

This paper discusses a non-standard technique for timedomain waveform synthesis. In Extended Waveform Segment Synthesis sound is described as a structure of blocks of amplitude micro-fluctuations. These structures can be updated during synthesis or different structures can be combined generating dynamic evolving waveforms. This technique is intended to be: first, an extension of the existing li...

متن کامل

Rock Music: Granular and Stochastic Synthesis based on the Matanuska Glacier

Geological data from measurement of sediment granulation over a 24 hour period in a lake formed by the Matanuska Glacier in Alaska was applied to synthesis parameters. The parameters measured include time, grain size and grain frequency. Several mapping strategies were employed to generate different types of additive and granular synthesis sound materials: 1) time to event grouping, sediment gr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017